Detection of Escherichia coli O157:H7 in Ground Beef Using Long-Read Sequencing

Foodborne pathogens are a significant cause of illness, and infection with Shiga toxin-producing Escherichia coli (STEC) may lead to life-threatening complications. The current methods to identify STEC in meat involve culture-based, molecular, and proteomic assays and take at least four days to complete. This time could be reduced by using long-read whole-genome sequencing to identify foodborne pathogens. Therefore, the goal of this project was to evaluate the use of long-read sequencing to detect STEC in ground beef. The objectives of the project included establishing optimal sequencing parameters, determining the limit of detection of all STEC virulence genes of interest in pure cultures and spiked ground beef, and evaluating selective sequencing to enhance STEC detection in ground beef. Sequencing libraries were run on the Oxford Nanopore Technologies’ MinION sequencer. Optimal sequencing output was obtained using the default parameters in MinKNOW, except for setting the minimum read length to 1 kb. All genes of interest (eae, stx1, stx2, fliC, wzx, wzy, and rrsC) were detected in DNA extracted from STEC pure cultures within 1 h of sequencing, and 30× coverage was obtained within 2 h. All virulence genes were confidently detected in STEC DNA quantities as low as 12.5 ng. In STEC-inoculated ground beef, software-controlled selective sequencing improved virulence gene detection; however, several virulence genes were not detected due to high bovine DNA concentrations in the samples. The growth enrichment of inoculated meat samples in mTSB resulted in a 100-fold increase in virulence gene detection as compared to the unenriched samples. The results of this project suggest that further development of long-read sequencing protocols may result in a faster, less labor-intensive method to detect STEC in ground beef.


Introduction
Foodborne pathogens, such as Escherichia coli, Salmonella spp., Campylobacter spp., and Listeria monocytogenes, remain a major cause of disease globally [1].The World Health Organization estimates that each year, one in ten people worldwide will be sickened by a foodborne pathogen, and 420,000 people will die [2].In the United States (U.S.), an estimated 48 million people become ill each year [3].Additionally, pathogen contamination of food is a significant economic burden estimated to cost the world economy USD 110 billion [2] and the U.S. economy USD 17 billion annually [4].During 2021, over 15 million pounds of meat were recalled in the U.S., and Shiga toxin-producing Escherichia coli (STEC) was the cause of two of those recalls, totaling 300,096 pounds [5].Infections by STEC have been increasing since 2018 and have an incidence rate of 5.7 per 100,000 people [6].An STEC infection generally causes diarrhea and vomiting but may result in severe diseases such as hemorrhagic colitis or hemolytic uremic syndrome [7].
The isolation and identification of STEC as an adulterant in meat by the U.S. Department of Agriculture Food Safety and Inspection Service (USDA FSIS) is achieved through a combination of culturing, molecular methods, O typing, and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry [8].Food samples are considered adulterated if they test positive for the stx and eae genes and one of the seven targeted O antigens commonly associated with STECs isolated from symptomatic patients [8].The stx1 and stx2 genes encode Shiga toxins 1 and 2, which cause cytotoxicity in the host, and expression of eae produces intimin, which mediates enterocyte colonization [7].The serotype most frequently associated with foodborne STEC outbreaks is O157:H7 [6].The serotype of E. coli is determined by the H antigen, which is present on the flagella, and the O antigen, which is present on the outer membrane [9].The current culture-based identification process takes at least four days to complete [8].Whole-genome sequencing could reduce the amount of time and labor needed for foodborne pathogen identification.
Advances in whole-genome sequencing technology have led to third-generation, or long-read, sequencing that could significantly reduce the amount of time needed to identify foodborne pathogens compared to the current culture-based methods.Oxford Nanopore Technologies' MinION device sequences RNA or DNA by detecting changes in electrical current as the strands of nucleic acid pass through nanopores on a flow cell [10].The long reads generated facilitate genome assembly [11], while their real-time analysis allows pathogen detection to be accomplished in hours instead of days [12].The small, portable sequencers allow whole-genome analysis to be conducted outside of traditional laboratories, and the cost is generally lower than second-generation sequencing.Additionally, significant progress has been made to reduce errors in nanopore sequencing, and raw read accuracy has improved to >99.9% [13].A previous in silico study by our research group also suggested that long-read sequencing would be practical for testing food for E. coli and L. monocytogenes contamination after growth enrichment [14].
The goal of this project was to evaluate the potential of long-read whole-genome sequencing for STEC detection.The objectives included establishing optimal sequencing parameters, determining the limit of detection of all STEC virulence genes of interest in pure culture and STEC-inoculated ground beef, and assessing the ability of software-controlled enrichment and depletion of specific genomic material to enhance the detection of STEC in inoculated meat.

STEC Inoculated Ground Beef
Ground beef spiked with E. coli O157:H7 was processed according to USDA FSIS methods [8].The following treatments were included: media only; uninoculated meat; 1 × 10 7 cfu mL −1 E. coli; and ground beef inoculated with 1 × 10 5 cfu g −1 , 1 × 10 6 cfu g −1 , or 1 × 10 7 cfu g −1 of E. coli.Treatments were prepared by placing 1 ± 0.01 g of ground beef on one side of a sterile 7 oz Whirl-Pak ® (Austin, TX, USA) strainer bag (except for the media and E. coli only controls) and then diluting 1:4 with mTSB.Each bag was stomached for 120 s with a Bag Mixer (Spiral Biotech Inc., Norwood, MA, USA).One experiment without enrichment was conducted with triplicate samples, and one experiment was conducted with enrichment with triplicate samples.The bags in the enrichment experiment were incubated statically for 24 h at 42 ± 1 • C. The samples were filtered through a 40 µm cell strainer (Greiner Bio-One North America Inc., Monroe, NC, USA) and then centrifuged at 130× g for 10 min.The supernatant was retained.An aliquot was plated on rainbow agar (Biolog Inc., Hayward, CA, USA) modified with potassium tellurite (Thermo Fisher Scientific, Waltham, MA, USA), novobiocin (RPI), and cefixime (RPI) (mRBA) to determine the concentration of E. coli.The remaining supernatant was centrifuged at 3400× g for 20 min.The supernatant was discarded, and the pellet was washed in 1 mL of phosphatebuffered saline (PBS; Boston Bioproducts Inc., Ashland, MA, USA) and then centrifuged at 12,000× g for 1 min.The supernatant was discarded, and the pellet was retained for DNA extraction.

DNA Extractions
The DNA from pure cultures of E. coli was extracted with a Monarch High-Molecular-Weight DNA Extraction Kit (New England BioLabs Inc., Ipswich, MA, USA) according to the manufacturer's instructions.The pellets from the ground beef experiments were extracted using a Qiagen DNeasy PowerFood Microbial Kit (Qiagen, Germantown, MD, USA) according to the manufacturer's instructions.DNA concentration and quality measurements were taken with a Denovix DS-11 FX+ spectrophotometer.

MinION Sequencing
Libraries of E. coli O157:H7 DNA were prepared using a Field Sequencing Kit (SQK-LRK001, Oxford Nanopore Technologies [ONT], Oxford, UK) according to the manufacturer's instructions.A flow cell check was performed prior to sequencing to ensure that enough pores were available for sequencing.The MinION Mk1B or Mk1C (ONT) were used with R9 flow cells (ONT).The optimal sequencing run time for the DNA extracted from a pure culture of E. coli O157:H7 was determined by testing the following time points in triplicate: 1 h, 2 h, 4 h, and 6 h.The limit of detection (LOD) for identifying target virulence genes in E. coli O157:H7 DNA was determined by testing the following DNA concentrations in triplicate: 400 ng, 200 ng, 100 ng, 50 ng, 25 ng, 12.5 ng, 6.25 ng, 3.125 ng, 1.56 ng, 0.78 ng, and 0.39 ng.The 400 ng concentration was sequenced with a 1 h run duration, and the 200 ng concentration was sequenced with a 2.5 h duration.The duration of the sequencing run for the remaining concentrations was determined using nonlinear regression to approximate the amount of time needed to obtain 400 k reads, which was the average number of reads generated during a 1 h run time with optimal (400 ng) DNA input.The DNA from spiked ground beef experiments was sequenced for 24 h.In some of the meat sequencing runs, the software-controlled depletion of the Bos taurus (domestic bovine) genome or the enrichment of the E. coli O157:H7 genome was employed.The reference B. taurus genome (NCBI Accession #NC037338.1)was uploaded into the MinKNOW software (ONT, version 22.03.6), and software-controlled depletion was enabled.When software-controlled enrichment was enabled, the E. coli O157:H7 reference genome (NCBI Accession #NC002695.5)was uploaded into the MinKNOW software.

Quantitative Real-Time Polymerase Chain Reaction
Quantitative real-time polymerase chain reaction (qPCR) was used to confirm the presence of the fliC, stx, eae, and rrsC genes from E.coli O157:H7 following established USDA FSIS protocols [15].A StepOne Real-Time PCR System (Applied Biosystems, Waltham, MA, USA) or QuantStudio 5 Real-Time PCR System (Applied Biosystems) was used for qPCR.

Data Analysis
Sequencing data were basecalled in real-time or post-run with MinKNOW software using fast or high-accuracy basecalling and a minimum read length filter of 1 kb.The FastQ files were imported into Geneious Prime software (version 2023) and aligned to an E. coli O157:H7 reference genome (NCBI Accession #NC002695.5)using Minimap2 (version 2.24).The target genes, namely fliC, eae, stx1a, stx1b, stx2a, stx2b, rrsC, wzx, and wzy, were searched for in the alignment, and the number of times each gene was detected was recorded.The mean, standard deviation, or standard error of triplicate runs were determined for sequencing run parameters, target gene detection, and qPCR Ct values.

Optimal Sequencing Parameters
The optimal sequencing output was obtained using the default settings in MinKNOW, with the exception of the minimum read length, which was set to 1 kb.For DNA extractions from pure cultures of E. coli O157:H7, a sequencing run time of 1 h was sufficient to detect the targeted virulence genes an average of 30.18 times (Table 1).However, due to variability between sequencing runs, as can be seen in the standard deviations in Table 1, a run time of 3 h was selected for the sequencing of pure bacterial cultures to ensure that enough data were generated.

Limit of Detection
In the runs conducted to determine the limit of detection in pure culture, the lowest DNA concentration at which all virulence genes of E. coli O157:H7 DNA were detected in each triplicate was 12.5 ng (Figure 1).Five lower concentrations (6.25, 3.13, 1.56, 0.78, and 0.39 ng) were tested, but all virulence genes could not be identified in all triplicates.Therefore, 12.5 ng was the lowest concentration of DNA that could be used to determine the E. coli serotype.The genes fliC, eae, rrsC, and stx were detected in the qPCR assay in all the DNA concentrations tested in the limit of detection analysis (Figure 2).

Spiked Ground Beef
Only the rrsC gene was detected in the normal sequencing runs of ground beef spiked with 10 5 or 10 6 cfu g −1 E. coli O157:H7, and using the software-controlled depletion of the bovine genome or the enrichment of the E. coli O157:H7 genome did not significantly increase gene detection (Figure 3A,B).All the target genes, except stx2B, were detected in the 10 7 cfu g −1 E. coli O157:H7-inoculated ground beef, and software-controlled enrichment significantly increased the detection of virulence genes, namely by two-fold (Figure 3C).Aliquots of each sample of spiked ground beef were also plated on agar prior to DNA extraction to determine the concentration of E. coli O157:H7 that remained after the stomaching, filtration, and centrifugation steps.The results indicated that there was a 1 log decrease in the concentration of E. coli O157:H7 from the amount inoculated in the meat to the amount present in the DNA extraction.lyzed to provide results on foodborne pathogen presence more quickly.The recent release of an accelerated basecaller, Dorado, by ONT, doubles the basecalling speed [18].This could make higher-accuracy models more practical for real-time analysis in future studies.
A 1 h sequencing run time was sufficient to detect the virulence genes of interest in DNA extractions from pure cultures of E. coli O157:H7.However, variability was high between the independent replicate samples in both the timed runs and the limit-ofdetection sequencing runs (Table 1).Other studies have also noted issues with inter-run variability [19,20].Efforts were made to reduce variability between runs by using the same DNA extraction and having the same technician perform the experiments.The primary variable between runs was the flow cell (R9.4.1).Flow cells have a total of 2048 nanopores, but the number of pores available for sequencing varied between flow cells and was lower if the flow cell was being reused.However, an analysis of the timed runs and the meat runs found no correlation between pore availability and the number of reads generated or data produced.The variability between runs, regardless of run time, prompted the selection of a 3 h run time for DNA extracted from pure cultures to ensure sufficient data generation despite potential variability in flow cell performance.High-quality genome assemblies require 30× coverage to ensure that the entire genome is sequenced and to distinguish errors from sequence variations [21].The virulence genes of interest were detected an average of 30 times during 1 h runs and 58 times during 2 h runs, suggesting that a 3 h run time would generate sufficient coverage despite potential variability.
All E. coli have an indistinguishable core genome that has genes for housekeeping, metabolic, and transport functions [22].The rrsC gene is a core gene, which is why it can only be used for identification to the species level.The accessory genome of E. coli contains genes that characterize the pathogenicity of specific pathotypes [22].Therefore, to confirm the O157:H7 serotype, eae, fliC, stx, wzx, and wzy need to be identified in a sample.Serial dilutions of E. coli O157:H7 DNA were tested to determine the limit of detection for all virulence genes of interest.Some of the virulence genes could be detected in the lowest concentration tested, 0.39 ng, but the lowest concentration at which all virulence genes were detected in each triplicate was 12.5 ng.The recommended DNA input for the Field Sequencing Kit is 400 ng, and extracting DNA from a bacterial culture or meat sample provides sufficient DNA to input the recommended concentration into the library preparation.However, we anticipated the potential need to sequence DNA from only one or a few colonies isolated on selective agar, which would yield lower concentrations of DNA than extraction from a bacterial culture or meat sample.The ability to detect genes of interest from a colony would save time by eliminating the need to culture it to a higher concentration to meet the recommended DNA input concentration for the sequencing kits.
The identification of the E. coli O157:H7 virulence genes of interest was difficult in the ground beef matrix due to the high prevalence of bovine DNA.In ground beef inoculated with 10 5 or 10 6 cfu g −1 E. coli O157:H7, only the rrsC gene was detected.All virulence genes of interest, except stx2B, were identified in the 10 7 cfu g −1 E. coli O157:H7-inoculated ground beef, but most genes were detected less than 10 times.Software-controlled enrichment and depletion improved the detection of the virulence genes, but it did not increase it enough to detect all of the genes needed to positively identify E. coli O157:H7.The infectious dose of E. coli O157:H7 is in the 10 s of cfu [23]; therefore, a testing method needs to be able to detect low concentrations of a pathogen to protect consumers.The inability to detect all virulence genes of interest, even at fairly high inoculum concentrations, prompted the testing of growth-enriched samples.The results showed that all virulence genes of interest were detected >100 times.Software-controlled enrichment further increased detection, but depletion generally decreased detection.This is likely because the concentration of E. coli DNA was higher than the bovine DNA after the growth enrichment, making depletion unnecessary.A previous in silico study conducted to determine the practicality of using long-read sequencing for foodborne pathogen detection indicated that growth enrichment would be necessary [14], and the results of the current study confirm that growth enrichment will be necessary to ensure the detection of very low concentrations of E. coli O157:H7 contamination in ground beef using sequencing.
Variability was noted in the number of times specific genes were detected in the samples, and this is likely due to differences in the gene copy number and the presence of nonpathogenic E. coli.The genes eae [24], fliC [25], wzx, and wzy [26] are single-copy genes.Generally, there is only one copy of each stx subtype gene, as well, but more than one copy can be present [27].These genes were typically identified in lower numbers than rrsC, of which there are multiple copies on the chromosome [28].Additionally, in the uninoculated ground beef controls, the rrsC gene of E. coli, which is specific to the species level, was detected.However, none of the target virulence genes were identified.These results indicated the presence of nonpathogenic E. coli in the ground beef samples, which would also increase the concentration of the rrsC gene but not the other virulence genes since they are only found in pathogenic strains.As discussed above, the accessory genes that define pathotypes are of primary interest in sequencing data to distinguish between nonpathogenic and pathogenic strains [22].Selective growth enrichment prior to sequencing can amplify pathogenic bacteria to ensure that their detection is not masked by nonpathogenic strains.

Conclusions
Foodborne pathogen detection using sequencing has multiple advantages over culturebased methods.The procedure can be completed in three days, which is one day faster than the current method.Sample preparation and enrichment occur on day 1; DNA extraction, library preparation, and 24 h sequencing are started on day 2; and the results are analyzed on day 3. Sequencing is also not labor-intensive.After the enrichment, which is the same as the current FSIS method, DNA extraction, library preparation, and flow cell loading only take 2 h.Bioinformatic analysis is accomplished in less than 30 min.Sequencing the whole genome also provides the needed information for serotype determination and the identification of antibiotic resistance genes.Additionally, multiple pathogens could be targeted in a sequencing run.Long-read whole-genome sequencing shows promise as an efficient method for the detection of foodborne pathogens in ground beef.

Figure 1 .
Figure 1.The mean ± standard error of the number of times each virulence gene of interest was detected in the Escherichia coli O157:H7 DNA concentrations tested in the limit of detection assays.

Figure 2 .
Figure 2. The mean ± standard error of the Ct values for each Escherichia coli O157:H7 virulence gene of interest detected using qPCR in the DNA concentrations used in the limit of detection assays.

Figure 1 .
Figure 1.The mean ± standard error of the number of times each virulence gene of interest was detected in the Escherichia coli O157:H7 DNA concentrations tested in the limit of detection assays.

Figure 1 .
Figure 1.The mean ± standard error of the number of times each virulence gene of interest was detected in the Escherichia coli O157:H7 DNA concentrations tested in the limit of detection assays.

Figure 2 .
Figure 2. The mean ± standard error of the Ct values for each Escherichia coli O157:H7 virulence gene of interest detected using qPCR in the DNA concentrations used in the limit of detection assays.

Figure 2 .
Figure 2. The mean ± standard error of the Ct values for each Escherichia coli O157:H7 virulence gene of interest detected using qPCR in the DNA concentrations used in the limit of detection assays.

Figure 3 .
Figure3.The number of times each virulence gene of interest was detected using regular, softwarecontrolled depletion, or software-controlled enrichment long-read sequencing of ground beef inoculated with (A) 1 × 10 5 CFU g −1 Escherichia coli O157:H7, (B) 1 × 10 6 CFU g −1 E. coli O157:H7, (C) 1 × 10 7 CFU g −1 E. coli O157:H7, and (D) 1 × 10 7 CFU g −1 E. coli O157:H7 growth enriched for 24 h.The number of times each gene of interest was detected in (E) the media only, ground beef only, and media or ground beef only controls growth enriched for 24 h; (F) shows the mean ± standard deviation of Ct values in the qPCR assay for the virulence genes; * indicates the growth enrichment.

Table 1 .
Mean ± standard deviation of run parameters and target gene detection in timed runs.